3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos

نویسندگان

  • Jun Wan
  • Qiuqi Ruan
  • Wei Li
  • Gaoyun An
  • Ruizhen Zhao
چکیده

Human activity recognition based on RGB-D data has received more attention in recent years. We propose a spatiotemporal feature named three-dimensional (3D) sparse motion scale-invariant feature transform (SIFT) from RGB-D data for activity recognition. First, we build pyramids as scale space for each RGB and depth frame, and then use Shi-Tomasi corner detector and sparse optical flow to quickly detect and track robust keypoints around the motion pattern in the scale space. Subsequently, local patches around keypoints, which are extracted from RGB-D data, are used to build 3D gradient and motion spaces. Then SIFT-like descriptors are calculated on both 3D spaces, respectively. The proposed feature is invariant to scale, transition, and partial occlusions. More importantly, the running time of the proposed feature is fast so that it is well-suited for real-time applications. We have evaluated the proposed feature under a bag of words model on three public RGB-D datasets: one-shot learning Chalearn Gesture Dataset, Cornell Activity Dataset-60, and MSR Daily Activity 3D dataset. Experimental results show that the proposed feature outperforms other spatiotemporal features and are comparative to other state-of-the-art approaches, even though there is only one training sample for each class. © 2014 SPIE and IS&T [DOI: 10.1117/1.JEI.23.2.023017]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One-Shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform (3D EMoSIFT) is proposed, which fuses RGB-D data. Compared with other ...

متن کامل

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the trai...

متن کامل

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

3D Reconstruction of a Landslide by Application of UAV & Structure from Motion

Landslides exhibit complex geomorphologies and are very difficult to measure. Photogrammetric methods are promising tools to overcome such problems by reconstructing 3D from overlapping images of the surface. Airborne and terrestrial image acquisition platforms are possible data sources for comprehensive digital landslide modelling. This study presents a computer vision application of the struc...

متن کامل

Algorithms for People Re-identification from Rgb-d Videos Exploiting Skeletal Information

In this thesis, a novel methodology to face the people re-identification problem is proposed. Re-identification is a complex research topic in Computer Vision representing a fundamental issue, especially for intelligent video surveillance applications. Its goal is to determine the occurrences of the same person in different video sequences or images, usually by choosing from a high number of ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Electronic Imaging

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2014